Sampling Based Approaches to Handle Imbalances in Network Traffic Dataset for Machine Learning Techniques

نویسندگان

  • Raman Singh
  • Harish Kumar
  • R. K. Singla
چکیده

Network traffic data is huge, varying and imbalanced because various classes are not equally distributed. Machine learning (ML) algorithms for traffic analysis uses the samples from this data to recommend the actions to be taken by the network administrators as well as training. Due to imbalances in dataset, it is difficult to train machine learning algorithms for traffic analysis and these may give biased or false results leading to serious degradation in performance of these algorithms. Various techniques can be applied during sampling to minimize the effect of imbalanced instances. In this paper various sampling techniques have been analysed in order to compare the decrease in variation in imbalances of network traffic datasets sampled for these algorithms. Various parameters like missing classes in samples, probability of sampling of the different instances have been considered for comparison.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Issues Related to Sampling Techniques for Network Traffic Dataset

Network traffic data is huge, varying and imbalanced because various classes are not equally distributed. Machine learning (ML) algorithms for traffic analysis uses the samples from this data to recommend the actions to be taken by the network administrators. Due to imbalances in dataset, machine learning algorithms may give biased or false results leading to serious degradation in performance ...

متن کامل

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...

متن کامل

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

Convergence Optimization of Backpropagation Artificial Neural Network Used for Dichotomous Classification of Intrusion Detection Dataset

There are distinguished two categories of intrusion detection approaches utilizing machine learning according to type of input data. The first one represents network intrusion detection techniques which consider only data captured in network traffic. The second one represents general intrusion detection techniques which intake all possible data sources including host-based features as well as n...

متن کامل

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1311.2677  شماره 

صفحات  -

تاریخ انتشار 2013